neural network learn low-dimensional polynomial
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Prior works showed that gradient-based training of neural networks can learn this target with n\gtrsim d {\Theta(p)} samples, and such complexity is predicted to be necessary by the correlational statistical query lower bound. Surprisingly, we prove that a two-layer neural network optimized by an SGD-based algorithm (on the squared loss) learns f_* with a complexity that is not governed by the information exponent. Specifically, for arbitrary polynomial single-index models, we establish a sample and runtime complexity of n \simeq T \Theta(d\cdot\mathrm{polylog} d), where \Theta(\cdot) hides a constant only depending on the degree of \sigma_*; this dimension dependence matches the information theoretic limit up to polylogarithmic factors. More generally, we show that n\gtrsim d {(p_*-1)\vee 1} samples are sufficient to achieve low generalization error, where p_* \le p is the \textit{generative exponent} of the link function. Core to our analysis is the reuse of minibatch in the gradient computation, which gives rise to higher-order information beyond correlational queries.